Explore the data
summary(airquality)
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
Structure of the dataset
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Plot of temperature vs. ozone concentration
## Warning: Removed 37 rows containing missing values (`geom_point()`).

Correlation between temperature and ozone concentration
cor(airquality$Temp, airquality$Ozone)
## [1] NA
## Warning: Removed 42 rows containing missing values (`geom_point()`).

cor_matrix <- cor(airquality[, c("Ozone", "Solar.R", "Wind", "Temp", "Month", "Day")])
print(cor_matrix)
## Ozone Solar.R Wind Temp Month Day
## Ozone 1 NA NA NA NA NA
## Solar.R NA 1 NA NA NA NA
## Wind NA NA 1.0000000 -0.4579879 -0.178292579 0.027180903
## Temp NA NA -0.4579879 1.0000000 0.420947252 -0.130593175
## Month NA NA -0.1782926 0.4209473 1.000000000 -0.007961763
## Day NA NA 0.0271809 -0.1305932 -0.007961763 1.000000000
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Warning: Ignoring 37 observations